Segmenting and Tagging Structured Content
نویسندگان
چکیده
Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation, and cross-language information retrieval. More importantly, they represent a class of document that have a great deal of structure, and this structure is not fundamentally spatial. Structure is provided by the authors (or publishers) who use of different fonts, font style, spacing and special symbols to implicitly convey information on the structure of the content. Our system is divided into three phases – Dictionary Segmentation, Entry Tagging and Generation. In segmentation, pages are divided into logical entries based on structural features learned from selected examples. The extracted entries are associated with functional labels and passed to a tagging module that associates linguistic labels with each word or phrase in the entry. The output of the system is a structure that represents the entries of the dictionary. In this document, we discuss the fundamental image processing approach that lets us extract this structure. Details of how we perform content tagging is mentioned briefly and references to related publications are provided.
منابع مشابه
Learning by tagging: The role of social tagging in group knowledge formation1
This research presents a case study on the use of Social Tagging in an undergraduate classroom at the University of Michigan during the Fall 2005 semester. Students were between 20 and 22 years of age. Students tagged their individual blog posts to contribute to themes and conversations in an online learning environment. Using content analysis of the blog posts and tags as well as semi-structur...
متن کاملOn the Construction of Efficiently Navigable Tag Clouds Using Knowledge from Structured Web Content
In this paper we present an approach to improving navigability of a hierarchically structured Web content. The approach is based on an integration of a tagging module and adoption of tag clouds as a navigational aid for such content. The main idea of this approach is to apply tagging for the purpose of a better highlighting of cross–references between information items across the hierarchy. Alt...
متن کاملBasic Principles for Segmenting Thai EDUs
This paper proposes a guideline to determine Thai elementary discourse units (EDUs) based on rhetorical structure theory. Carson and Marcu’s (2001) guideline for segmenting English EDUs is modified to propose a suitable guideline for segmenting EDUs in Thai. The proposed principles are used in tagging EDUs for constructing a corpus of discourse tree structures. It can also be used as the basis ...
متن کاملTowards the Semantic Web: Collaborative Tag Suggestions
Content organization over the Internet went through several interesting phases of evolution: from structured directories to unstructured Web search engines and more recently, to tagging as a way for aggregating information, a step towards the semantic web vision. Tagging allows ranking and data organization to directly utilize inputs from end users, enabling machine processing of Web content. S...
متن کاملSegmenting Cardiac MRI Tagging Lines using Gabor Filter Banks
This paper describes a new method for the automated segmentation and extraction of cardiac MRI tagging lines. Our method is based on the novel use of a 2D Gabor filter bank. By convolving the tagged input image with our Gabor filters, the tagging lines are automatically enhanced and segmented out. We design the Gabor filter bank based on the input image’s spatial and frequency characteristics. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003